Evidence-Based Information Extraction for High Accuracy Citation and Author Name Identification

نویسندگان

  • Brett Powley
  • Robert Dale
چکیده

Citations play an essential role in navigating academic literature and following chains of evidence in research. With the growing availability of large digital archives of scientific papers, the automated extraction and analysis of citations is becoming increasingly relevant. However, existing approaches to citation extraction still fall short of the high accuracy required to build more sophisticated and reliable tools for citation analysis and corpus navigation. In this paper, we present techniques for high accuracy extraction of citations and references from academic papers. By collecting multiple sources of evidence about entities from documents, and integrating citation extraction, reference segmentation, and citation– reference matching, we are able to significantly improve performance in subtasks including citation identification, author named entity recognition, and citation–reference matching. Applying our algorithm to previously-unseen documents, we demonstrate high F-measure performance of 0.980 for citation extraction, 0.983 for author named entity recognition, and 0.948 for citation–reference matching.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

بررسی وضعیت صحت مقالات استنادی پایان نامه‌های دوره‌های دکترای تخصصی پزشکی دانشگاه علوم پزشکی تهران

Background and Aim: Citation could be considered as basis of scientific researches. Each researcher will use citation to prove his scientific findings either to be in correspondence with truth or to familiarize readers with more references. Maintenance and continuation of informational link by citation is essential. Theses are not exceptional for this subject. This study was done to review the ...

متن کامل

Author gender identification from text using Bayesian Random Forest

Nowadays high usage of users from virtual environments and their connection via social networks like Facebook, Instagram, and Twitter shows the necessity of finding out shared subjects in this environment more than before. There are several applications that benefit from reliable methods for inferring age and gender of users in social media. Such applications exist across a wide area of fields,...

متن کامل

Citation Analysis of the Most Influential Publications in Travel Medicine

Introduction: Citation analysis reflects the extent to which published work has been recognized in the scientific community. The purpose of this study was to characterize the most cited publications in travel medicine.Methods: Travel medicine articles indexed on Scopus which had been published in the English language through 2016 were retrieved independen...

متن کامل

Behavioral Analysis of Traffic Flow for an Effective Network Traffic Identification

Fast and accurate network traffic identification is becoming essential for network management, high quality of service control and early detection of network traffic abnormalities. Techniques based on statistical features of packet flows have recently become popular for network classification due to the limitations of traditional port and payload based methods. In this paper, we propose a metho...

متن کامل

Author name disambiguation: What difference does it make in author-based citation analysis?

In this paper, we explore how strongly author name disambiguation (AND) affects the results of an author-based citation analysis study, and identify conditions under which the commonly used simplified approach of using surnames and first initials may suffice in practice. We compare author citation ranking and co-citation mapping results in the stem cell research field 2004-2009 between two AND ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007